Edinburgh 2007/25 



Measuring Supersymmetry 

Remi Lafaye/ Tilman Plehn,^ Michael Rauch,^ and Dirk Zerwas^ 

^LAPP, Universite Savoie, IN2P3/CNRS, Annecy, France 
^SUPA, School of Physics, University of Edinburgh, Scotland 
^LAL, Universite Paris-Sud, IN2P3/CNRS, Orsay, France 
(Dated: February 2, 2008) 

Abstract 

If new physics is found at the LHC (and the ILC) the reconstruction of the underlying theory 
should not be biased by assumptions about high-scale models. For the mapping of many mea- 
surements onto high-dimensional parameter spaces we introduce SFitter with its new weighted 
Markov chain technique. SFitter constructs an exclusive likelihood map, determines the best- 
fitting parameter point and produces a ranked list of the most likely parameter points. Using the 
example of the TeV-scale supersymmetric Lagrangian we show how a high-dimensional likelihood 
map will generally include degeneracies and strong correlations. SFitter allows us to study such 
model-parameter spaces employing Bayesian as well as frequentist constructions. We illustrate in 
detail how it should be possible to analyze high-dimensional new-physics parameter spaces like 
the TeV-scale MSSM at the LHC. A combination of LHC and ILC measurements might well be 
able to completely cover highly complex TeV-scale parameter spaces. 
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I. NEW PHYSICS AT THE TEV SCALE 



In the coming years, the major effort in high-energy physics will be the search for a Higgs 
boson or an alternative to such a fundamental Higgs scalar at the LHC. However, funda- 
mental scalars are difficult to accommodate in field theory — their masses are quadratically 
divergent with the cutoff scale of the theory. This problem naturally leads to speculations 
about the necessary ultraviolet completion of the Standard Model, which should remove 
such quadratic divergences and allow to extrapolate our understanding to maybe even the 
Planck scale. Such an ultraviolet completion can (and should) at the same time solve the 
second big mystery of high-energy physics, the existence of cold dark matter. 

An overwhelming amount of data on possible ultraviolet completions of the Standard 
Model has been amassed over the past decades, consistently confirming the Standard Model. 
LEP and Tevatron have put stringent bounds on the masses of new particles, cutting into the 
preferred region for example for supersymmetric dark matter [l], 0| not only via the derived 
light Higgs mass, but also via direct searches The anomalous magnetic moment of the 
muon may or may not seriously threaten the Standard Model, but it will certainly disfavor 
many possible interpretations of LHC signatures 0] • Flavor physics lead to the postulation 
of additional symmetries in ultraviolet completions, an example being supersymmetry j^, 0] . 
And last but not least, the measured relic density of the dark matter agent puts very 
stringent constraints not only on the mass and coupling of such a candidate, but also on 
other particles involved in the annihilation process or in its (direct or indirect) detection j^. 

Many new-physics scenarios do not simply predict a new narrow resonance, such as for 
example a Z' . Instead, a wealth of measurements at the LHC, and later on at the ILC and 
other experiments might be available, and with it the need to be combined properly. The 
situation could be similar to current fits of electroweak precision data, but most likely it 
will be much more complex. The LHC era with all its experiments can give a great many 
hints about new-physics scenarios, it will certainly rule out large classes of extensions of 
the Standard Model — but it will definitely not give a one-to-one map between a limited 
number of observables and a well-defined small set of model parameters. 

Bayesian probability distributions and frequentist profile likelihoods are two ways to 
study an imperfectly measured parameter space, where some model parameters might be 
very well determined, others heavily correlated, and even others basically unconstrained. 
This situation is different for example from 5 physics, where theoretical degeneracies and 
symmetries have become a major challenge 0, @]- ^ careful comparison of the benefits and 
traps of the frequentist and the Bayesian approaches in the light of new-physics searches is 
therefore necessary. SFitter follows both paths. 

If heavy strongly interacting particles can be produced at the LHC, they will decay into 
lighter weakly interacting particles and finally into the dark matter candidate [q, T^, 11, 12 



with decay cascades longer than the top-quark decay chain. These cascade measurements 
not only carry information on the masses of the particles involved. The angular correlations 
also reflect the spins of the particles in the cascade and allow tests for example of the SUSY 
hypothesis against an extra-dimensional hypothesis [l^]. At the ILC, detailed analyses of 
kinematically accessible particles will be possible, for example using threshold scans (l3 |. 
Masses, branching ratios as well as measurements of particle spins will shed additional light 
on the underlying theory. Currently, no attempt is made to measure discrete quantum num- 
bers of new-physics particles using SFitter. Instead the analysis is hmited to the continuous 
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space of the parameters. 



In this paper the analysis will be restricted to the parameter point SPSla T^, only 
because it has been studied in detail by the experimental communities at the LHC and 
ILC. After briefly reviewing the experimental results and the treatment of the experimental 
and theoretical errors in SFitter, the relevant features of these measurements as well as the 
approach of SFitter [16:] will be illustrated in the MSUGRA model, before moving on to the 
weak-scale MSSM. 

The organization of the MSUGRA and the MSSM sections follows the general logic 
of SFitter: First, a fully exclusive log-likelihood map of the respective parameter space is 
constructed and a ranked list of the best-fitting points is produced. We will see that already 
for the MSUGRA parameter space the LHC measurements will lead to strong correlations 
and alternative likelihood maxima. The situation will become more complex in the case of 
the MSSM, where equally good alternative best-fitting points are induced by the structure 
of the gaugino-higgsino mass parameters, the sign of the higgsino mass parameter, and the 
correlations between the trilinear coupling in the top sector and the top-quark mass. This 
degeneracy would have to be broken by additional measurements, at the LHC or elsewhere. 

Starting from the log-likelihood map we then use frequentist and Bayesian constructions 
to study lower-dimensional probability distributions including correlation effects. Again, 
this analysis illustrates the complex structure of the MSSM parameter space as well as the 
features of the statistics methods employed. Finally, the weak-scale MSSM Lagrangian 
is reconstructed with proper experimental and theory error distributions. This weak-scale 
result should serve as a starting point to probe supersymmetry breaking bottom-up without 
theoretical bias. In the appendices we discuss the techniques of SFitter using a simple toy 
model. 



The approach of mapping measurements onto a high-dimensional parameter space as well 
as the SFitter tool are completely general^: model parameters as well as measurements are 
included in the form of model files and can simply be replaced. SFitter serves as a general 
tool to map typically up to 20-dimensional highly complex parameter spaces onto a large 
sample of highly correlated measurements of different quality. 



II. COLLIDER DATA 



The analysis in this paper critically depends on detailed experimental simulations of mea- 
surements and errors at the LHC and at the ILC. Therefore the well-understood parameter 
point SPSla [l5| is used. This point has a favorable phenomenology for both LHC and 
ILC. The original version SPSla instead of the dark-matter corrected SPAl/SPSla' point 
is used, since cosmological measurements like the relic density are not part of this work [l8| . 



^ Fittino (TTi] follows a very similar logic to SFitter, including a scan of the high-dimensional MSSM pa- 
rameter space. 
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A. LHC and ILC measurements 



The parameter point SPSla is characterized by moderately heavy squarks and gluinos, 
which leads to long cascades including the neutralinos and sleptons. Gauginos are lighter 
than Higgsinos, and the mass of the lightest Higgs boson is close to the mass limit determined 
at LEP. The summary of particle mass measurements is listed in TablelU taken from Ref. 19 . 
The central values are calculated by SuSpect [13]. In general, we see from the table that 
the LHC has the advantage of a better coverage of the strongly interacting sparticle sector, 
whereas a somewhat better coverage and precision in particular in the gaugino sector can be 
obtained with the ILC 21, 2^- It should be noted that the quoted LHC mass measurements 
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are obtained from measurements of kinematical endpoints and mass differences 
the observables shown in Table [Tll The systematic error quoted in these measurements 
is essentially due to the uncertainty in the lepton and jet energy scales, expected to be 
0.1% and 1%, respectively, at the LHC. These energy-scale errors are each taken to be 99% 



correlated as discussed in Ref 19 



Precision mass measurements at the LHC are not possible from the measurement of pro- 
duction rates of certain final states, i.e. combinations of (a -BR). The reason are the sizeable 
QCD uncertainties on the cross section [25], often largely due to gluon radiation from the ini- 
tial state, but by no means restricted to this one aspect of higher-order corrections. Generic 
errors on the cross section alone of at least 20%, plus errors due to detector efficiencies and 
coverage imply that one would only rely on (cr ■ BR)-type information in the absence of other 
useful measurements 



25| . For such cases, the next-to-leading order production rates for 
strongly interacting sparticles (based on Prospino2 |2^) are implemented in SFitter and 
can be readily included in the analysis. The same is true for the branching ratios, where 
interfaces to MSMlib [13] and Sdecay/S-HIT [2^ are implemented. The QCD corrections 
to measurements of the decay kinematics are known to be under control: additional jet 
radiation is well described by shower Monte Carlos [291] and will not lead to unexpected 
QCD effects. Off-shell effects in cascade decays can of course be large once particles become 
almost mass degenerate 
to be small. 



30|, but in the standard SPSla cascades these effects are expected 



For the ILC, as a rule of thumb if particles are light enough to be produced in pairs 
given the center-of-mass energy of the collider, their mass can be determined with impressive 
accuracy. The mass determination is possible either through direct reconstruction or through 
a measurement of the cross section at production threshold with comparable accuracy but 
different systematics. Precision measurements of the branching ratios, e.g. of the Higgs 
boson, are possible. Additionally discrete quantum numbers like the spin of the particles 
can be determined similarly well. 



B. Error determination 



In order to obtain reliable error estimates for the fundamental parameters, a proper 
treatment of experimental and theory errors depending on their origin is mandatory. The 
CKMfitter prescription is largely followed. The complete set of errors in the MSUGRA as 
well as in the MSSM analysis includes statistical experimental errors, systematic experimen- 
tal errors, and theory errors. The statistical experimental errors are treated as uncorrelated 
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TABLE I: Errors for the mass determination in SPSla, taken from [19|. Shown are the nominal 
parameter values (from SuSpect), the error for the LHC alone, from the LC alone, and from a 
combined LHC+LC analysis. Empty boxes indicate that the particle cannot, to current knowledge, 
be observed or is too heavy to be produced. All values are given in GeV. 



in the measured observables. In contrast, the systematic experimental errors for example 
from the jet and lepton energy scales 3] are fully correlated. Hence, both are non-trivially 
correlated in the masses determined from the endpoints. Theory errors are propagated from 
the masses to the measurements. 



As there is no reason why unknown higher-order corrections should be centered around 
a given value or even around zero, the theory error of the weak-scale masses is not taken 
to be gaussian but flat box-shaped: the probability assigned to any measurement does not 
depend on its actual value, as long as it is within the interval covered by the theory error. 
A tail could be attached to these theory-error distributions, but higher-order corrections 
are precisely not expected to become arbitrarily large. Confronted with a perturbatively 
unstable observable one would instead have to rethink the perturbative description of the 
underlying theory. 

Taking this interval approach seriously impacts not only the distribution of the theory 
error, but also its combination with the combined (gaussian) experimental error. A simple 
convolution of a box-shaped theory error with a gaussian experimental error leads to the 
difference of two one-sided error functions. Numerically, this function will have a maximum, 
so the convolution still knows about the central value of theoretical prediction. On the other 
hand, the function is never flat and differentiable to arbitrarily high orders at all points. 

A better solution is a distribution which is flat as long as the measured value is within 
the theoretically acceptable interval and outside this interval drops off like a gaussian with 
the width of the experimental error. The log-likelihood = — 21og£ given a set of mea- 
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TABLE II: LHC measurements in SPSla, taken from 19|]. Shown are the nominal values (from 
SuSpect) and statistical errors, systematic errors from the lepton (LES) and jet energy scale (JES) 
and theoretical errors. All values are given in GeV. 



surements d and in the presence of a general correlation matrix C reads 
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I dj — dA > ci^^^'^°^ 



(1) 



a, 



where di is the i-th data point predicted by the model parameters and cLthe actual measure- 
ment. This definition corresponds to the RFit scheme described in Ref. [7|. The experimental 
errors are considered to be gaussian, so they are summed quadratically. The statistical error 
is assumed to be uncorrelated between different measurements. The first systematic error 
originates from the lepton energy scale and is taken as 99% correlated between two mea- 
surements. Correspondingly, a^^'> stems from the jet energy scale and is also 99% correlated. 
The correlations are absorbed into the correlation matrix C 



Ci 1 — c. 



0.99 af + 0.99 a? af 



cr, 



(exp) ^(exp) 



(2) 



While box-shaped error distributions for observables are conceptually no problem, they 
lead to a technical complication with hill-climbing algorithms. All functions used to describe 
such a box-shaped distribution will have a discontinuity of higher derivatives in at least one 
point. The prescription above has a step in the second derivative at d ± which leads 
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to a problem for example with Minuit's Migrad algorithm. Details on this problem are given 
in the Appendix. 

A second complication with fiat distributions is that in the central region the log- 
likelihood is a constant as a function of some model parameters. In those regions these 
parameters vanish from the counting of degrees of freedom. For all results shown in this 
paper fiat theory errors are assumed, unless stated otherwise. Results with different theory 
errors are discussed in Sec. Ill B[ 

To determine the errors on the fundamental parameters two techniques are used: a direct 
determination for the best fit using Minuit and a statistical approach using sets of toy 
measurements. The advantage of Minuit is that only one fit is necessary to determine the 
errors. For the non-gaussian error definition used above only Minos (of Minuit) can be used, 
as it determines the intervals i 1 without assuming gaussian errors. However, there is a 
complication because of the flat region. Its algorithm computes the second derivative of the 
log-likelihood for example in its convergence criterion. This second derivative has two steps 
precisely in the region where one would expect the algorithm to converge. Therefore the 
Minos algorithm may not perform well with flat error distributions in the log-likelihood. 

SFitter provides the option to smear the input measurement sets according to their er- 
rors, taking into account the error form (flat or gaussian) and the correlations e.g. of the 
systematic energy scale errors. For each of the smeared toy-measurement sets SFitter deter- 
mines the best-fit value. The width of the distribution of the best-fit values of a parameter 
gives the error on this parameter. This option is time consuming (many fits are needed), 
but necessary to be able to obtain the correct confidence level intervals. Hence, this is the 
method used to determine the parameter errors whenever flat theory errors are assumed. 
For other cases this smearing technique can be used as a cross-check. 

III. MSUGRA 

No model for supersymmetry breaking should be assumed for analyses. Instead, the 
breaking mechanism should be inferred from data. 

However, the supersymmetric parameter space can be simplified by unification assump- 
tions, leading to an easily solvable problem. A simple Minuit fit is sufficient to determine the 
MSUGRA [31] parameters mg, mi/2, Aq and tan f3 from the mass or endpoint measurements 
at the LHC and/or ILC. The correct sign of /i is determined by the quality of the fit which 
is worse for the hypothesis with the wrong sign. Such a fit can be an uncorrelated gaussian 

fit or it can include all correlations and correlated errors, and none of the errors have 
to be assumed to be gaussian. Using SFitter a log-likelihood fit is performed, extracting 
the best-fitting point in the respective MSUGRA (or later MSSM) parameter space and 
determining the errors including all correlations. 

Because of the sizeable error on the top mass (LHC target: 1 GeV; ILC target: 0.12 GeV), 
the top mass or Yukawa must be included in any SUSY fit [s^. In a way, the running 
top Yukawa is defined at the high scale as one of the MSUGRA model parameters, which 
through coupled renormalization group running predicts all low-scale masses, including the 
top-quark mass, all supersymmetric partner masses, and the light Higgs mass [s^]. In 
principle, this approach should be taken for all Standard Model parameters, couplings and 
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masses [32j, but at least for moderate values of tan/3 for example the bottom Yukawa 
coupling has negligible impact on the extraction of supersymmetric parameters. 

The first question to be discussed in the simplified MSUGRA context is whether it is 
possible to unambiguously identify the correct parameters from a set of observables and 
their errors. In other words, which parameter point has the largest likelihood value p{d\m), 
evaluated as a function over the model-parameter space m for a given data set d. Note 
that discrete model assumptions (like MSUGRA vs extra dimensions) are not included. 
Instead, one model with a multi-dimensional vector of continuous parameters is scanned. 
The question immediately arises if there are secondary maxima in the likelihood map of the 
parameter space. 

In a one-dimensional problem the probability distribution function (pdf) p{d\m) for an 
observable d given a vector of model parameters m can be used to compare two hypotheses 
for a given data set: decide which of the two hypotheses rrij with their central values d* is 
preferred and compute the integral over the 'wrong' pdf p((i|m„rong) from d*^g^^ to infinity. 
This integral gives the confidence level of the decision in favor of one of the two hypotheses. 
Note that this extraction applies to discrete and to continuous parameter determination, 
but it requires that we start from a mathematically properly defined pdf in the observable 
space. 

For the procedure described above the Neyman-Pearson lemma states that if the correct 
hypothesis is picked as 'right', a likelihood-ratio estimator will produce the smallest possible 
type-II error, i.e. the smallest error caused by mistaking a fluctuation of 'wrong' for 'right'. 
A likelihood ratio can be extracted from simulations [s^], or from data combined with 
simulations [ssj or from data alone |3]. To test well-defined hypotheses using powerful data, 
including for example the top-mass measurement, likelihood methods can yield impressive 
results. Such a likelihood method can easily be generalized to high-dimensional observable 
spaces or model-parameter vectors, as long as it is applied to properly defined probability 
distributions. The crucial and highly controversial question is how to produce a pdf when the 
parameter space is high-dimensional and poorly constrained dimensions of it are ignored. 

SFitter provides the relevant frequentist or Bayesian results in three steps: first (1), 
SFitter computes a log-likelihood map of the entire parameter space. This map is completely 
exclusive, i.e. it includes all dimensions in the parameter space. Then (2), SFitter ranks 
the best local likelihood maxima in the map according to their log-likelihood values. It 
identifies the global maximum, and a bias towards secondary maxima (e.g. SUSY breaking 
scenario) can be included, without mistaking such a prior for actual likelihood. Last (3), 
SFitter computes profile likelihood or Bayesian probability maps of lower dimensionality, 
down to one-dimensional distributions, by properly removing or marginalizing unwanted 
parameter dimensions. Only in this final step frequentist and Bayesian approaches need to 
be distinguished. The three steps are illustrated in the Appendix for a simple toy model. 



A. Likelihood analysis 

Looking for example at the parameter point SPSla at the LHC, different parameters 
are heavily correlated, some parameters are only poorly constrained, and distinct different 
maxima in £ ~ can differ by 0{N), where N is the number of observables. There- 
fore, one would like to produce probability distributions or likelihoods over subspaces of the 
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FIG. 1: SFitter output for MSUGRA in SPSla. Upper left: list of the best log-likelihood values 
over the MSUGRA parameter space. Upper right: two-dimensional profile likelihood over the 
mo-?Tix/2 plane. Lower: one-dimensional profile likelihoods 1/x^ for rriQ and All masses are 

given in GeV. 



model-parameter space from the fully exclusive likelihood map. In other words, unwanted 
dimensions of the parameter space are eliminated until only one- or two-dimensional 'like- 
lihoods' remain. The likelihood cannot just be integrated unless a measure is defined in the 
model space. This measure automatically introduces a bias and leads to a Bayesian pdf. 

Instead, in this section a profile likelihood is used: for each (binned) parameter point 
in the {n — l)-dimensional space we explore the nth direction which is to be removed 
'^(a^i,...,n-i, Xn)- The best value of £™ax(n) picked along this direction and its function value 
is identified with the lower-dimensional parameter point = . ,„_i,a;„). 

Using this kind of projection most notably guarantees that the best-fit points always sur- 
vives to the final representation, unless two of them belong to the same bin in the reduced 
parameter space. 

For the MSUGRA case the likelihood map is computed over the entire parameter space 
given a smeared LHC data set. This map covers the model parameters mo, mi/2, AQ,B,mt, 
where B is later traded for the weak-scale tan (3, as described in Sec. IIII CI Usually tan /3 
will be shown, because this parameter has a more obvious interpretation in the weak-scale 
theory. 

The SFitter result is shown in Fig. [TJ a completely exclusive map over the 5-dimensional 
parameter space is the starting point. Combining 30 Markov chains 600000 model-parameter 
points are collected. For the renormalization group running SoftSUSY ^] is used with an 
efficiency of 25 ■■■30%, which corresponds to a few hours of CPU time for each of the 
30 chains. Because the resolution of the Markov chain is not sufficient to resolve each 
local maximum in the log-likelihood map, an additional maximization algorithm (Minuit's 
Migrad) starts at the best points of the Markov chains to identify the local maxima. 
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In Fig. [T] the best-fit points in the MSUGRA parameter space are shown, as obtained 
from the 5- dimensional hkelihood map. For the SPSla parameter point a general pattern of 
four distinct maxima emerges in the likelihood: first, the trilinear coupling can assume the 
correct value of around —100 GeV, but it can also become large and positive ~ 700 GeV. 
This degeneracy is correlated with a slight shift in the top mass, which means it will be 
much less pronounced if the top quark mass is not part of the MSUGRA parameters set. 
This correlation occurs through the light Higgs mass and its strength largely depends on the 
theory error assumed for the Higgs mass. Secondly, a similar feature is present for each sign 
of fi, correlated with a slight shift in tan (3, which compensate each other in the neutralino- 
chargino sector. Such a degeneracy is expected, because at the LHC only one of the two 
heavy neutralinos are observed. Including the precise and more complete ILC measurements 
this degeneracy should vanish. 

An example correlation between two model parameters is the profile likelihood in the 
mo-mi/2 plane, after projecting away the Aq, B, sign(/i) and directions. The likelihood 
maximum starts from the true values mo = 100 GeV and mi/2 = 250 GeV and continues 
into two branches. These branches reflect the fact that extracting masses from kinematic 
endpoints involves quadratic equations. Ignoring such correlations between parameters the 
two-dimensional profile likelihood is projected onto each of the two remaining directions. 
Both distributions show sharp maxima of the profile likelihoods in the correct places, be- 
cause the resolution is not sufficient to resolve the four distinct solutions for Aq and sign(/x). 
Note that all these profile likelihood distributions are mathematically not probability distri- 
butions, because projecting on a parameter subspace does not protect the normalization of 
the original likelihood map (which can be viewed as a probability distribution). 

Thus eliminating a dimension in the parameter space means loss of information. There- 
fore, it is not obvious that producing low-dimensional distributions from the completely 
exclusive likelihood map is always sensible. An example is the correlation of rrit and Aq 
— as mentioned before a strong correlation from the Higgs mass measurement is expected. 
Fig. [2] shows the two-dimensional and one-dimensional profile likelihoods in the rrit-Ao sub- 
space. In the two columns the two signs of fi are separated; from the list of maxima the 
best-fit points are expected to be roughly 1 GeV higher in rrit and 30 ■ ■ ■ 80 GeV higher in 
Aq for > 0. 

Locally, the two-dimensional profile likelihoods around the maxima show little correlation 
between rrit and Aq. The correct value around Aq = —100 GeV is preferred, but the 
alternative solution around Aq = 900 GeV is clearly visible. On top of this double-maximum 
structure for both signs of fi there is a parabola-shaped correlation between the rrit and Aq. 
The apex of the parabola is roughly 5 GeV above the best fits in rrit. This correlation 
becomes invisible once one of the two parameter directions are projected away and the one- 
dimensional profile likelihoods are analyzed. The two alternative solutions do not appear in 
the rrit histogram, because the alternative maximum is relatively unlikely and because the 
two best-fit values for rrit differ by a mere GeV. The same is true for Aq where only a tiny 
tail towards the wrong solution can be seen. 

Since only one measurement smeared according to the gaussian experimental errors is 
used for the parameter extraction shown in Fig. [H the correct values do not have to coincide 
with the best log-likelihood among the local maxima. As a matter of fact, just changing the 
theory errors from the correct flat to a possibly approximate gaussian shape can have an 
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FIG. 2: SFitter output for Aq and yt- The two columns of one- and two-dimensional profile 
likelihoods correspond to /i < (left) and fi > (right). A map is shown in the first row and 
l/x^ distributions in the second and third. 



effect on the ranking of maxima: for gaussian theory errors the values of 4.35, 26.1, 10.5, 
22.6 appear in the order shown in Fig. [H In other words, just smearing the measurements 
can indeed shift the ordering of the best local maxima, supporting our claim that a careful 
look at more than just the best solution might make sense in a parameter space as complex 
as MSUGRA. 

Even if such inversions arise, the parameter determination can be repeated with different 
(smeared) sets of observables. The frequency with which the wrong parameter set corre- 
sponds to the lowest value is a measure how seriously degenerate the alternative maxima 
are. 



B. Bayesian approach 



A likelihood analysis as presented in the last section is unfortunately not designed to 
produce probability distributions for model parameters. This means it will not answer 
questions of the kind: in the light of electroweak precision constraints and dark matter 
constraints, what sign of fi is preferred in MSUGRA 37|? Note that this is not the same 



question as: what is the relative difference in the likelihood for the two best points on 
each side of /i. To answer the first question the likelihood over each of the two halves of 
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FIG. 3: SFitter output for MSUGRA in SPSla. Upper left: list of the largest log-likelihood values 
over the MSUGRA parameter space. Upper right: two-dimensional Bayesian pdf over the rriQ- 
777-1/2 plane marginalized over all other parameters. Lower: one-dimensional Bayesian pdfs 1/x^ 
for mo and All masses are given in GeV. 



the parameter space needs to be integrated over. All parameter dimensions except for ix 
must be integrated over to compute the pdf for /i given the data. For such an integration 
leading to lower-dimensional probability distributions a measure has to be introduced, the 
(Bayesian) prior. This prior has its advantages, but it can also lead to unexpected effects. 



as shown in the following [32 



One might argue that such questions are irrelevant because the goal is to find the correct, 
i.e. the most likely parameter point. On the other hand, asking for a reduced-dimensionality 
probability density could well be a very typical situation in the LHC era. Questions like: 
what kind of linear collider should be built given LHC data? What is the most likely 
mechanism for dark-matter annihilation? How to detect dark matter? deserve well-defined 
answers. 

As discussed before, shifting from a frequentist to a Bayesian approach does not affect the 
main part of the SFitter program. Or in other words, SFitter produces Bayesian probability 
distributions or profile likelihoods without any preference. While not strictly necessary in a 
Bayesian analysis, the top-likelihood points from Fig. [1] also appear in the Bayesian results 
shown in Fig. [3l The second panel in Fig. [3] now shows a two-dimensional representation of 
the Bayesian pdf over the MSUGRA parameter space. All parameter dimensions except for 
mo and mi/2 are marginalized using flat priors. The only slight complication arises from the 
treatment of B or tan /?, as described in Sec. IIII CI Unless explicitly stated otherwise the 
prior is flat in the high-scale mass parameter B. The results are typically shown in terms 
of tan /?, because this parameter is easier to interpret at the weak scale. 

In the two-dimensional pdf shown in Fig. [3] the same two-branch structure appears as 
for the profile likelihood. However, there are two differences: first, the area around the true 
parameter point is less pronounced in the Bayesian pdf, compared to the profile likelihood. 
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FIG. 4: SFitter output for Aq and yt- The two columns of marginalized Bayesian pdfs correspond 
to ^ < (left) and /j. > (right). For illustration purposes the parameters mo and mi/2 are only 
marginalized around their best-fit values. We show a map in the first row and distributions 
in the second and third. 



In the integration over a direction in parameter space noise gets collected from regions with 
a finite but insignificant likelihood. This noise washes out the peaked structures, while the 
profile likelihood by construction keeps mainly these best-fit structures. This effect also 
considerably smears the one-dimensional Bayesian pdf distributions in mo and mi/2. 

Secondly, the branch structure is more pronounced in Fig. [31 While in the profile like- 
lihood the area between the two branches is filled by single good parameter points in the 
parameters projected away, the Bayesian marginalization provides 'typical' likelihood values 
in this region which in general does not fit the data as well. 

Again in complete analogy to the likelihood analysis the study of the correlation of rrit 
and Aq serves as an example of how marginalizing parameters can weaken the understanding 
of the parameter space, independent of the frequentist or Bayesian approach. Fig. H] shows 
the Bayesian pdfs for rrit and Aq. Because of the strongly peaked likelihood map in the mo 
and mi/2 directions a full marginalization is not applied in these directions. Instead, the 
mass parameters mo and mi/2 are marginalized only in a frame ±2 GeV and tan/? is varied 
by ±1.5, always around the best-fit point for each sign of fi. This additional constraint 
or bias can be useful when producing a marginalized Bayesian pdf for comparably poorly 
measured parameters. In order not to be mislead it is necessary to explicitly check that 
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the partly marginalized parameters mo, mi/2, tan/3 are not significantly correlated with the 
remaining rrit and Aq. 

In rrit the Bayesian pdf is not symmetric with respect to the central values for each sign 
of fi. This asymmetry of the tails arises from the parabola shape of the m^-Ao correlation. 
The large-likelihood region around the apex becomes more important than the far-away 
arms of the parabola after marginalizing Aq. This is a typical volume effect in Bayesian 
statistics. At first sight these asymmetric tails of the Bayesian pdf for rrit seem to disagree 
with its profile likelihood, but it is a physics effect, i.e. a correlation marginalized away. 
This result is useful when it comes to trying to resolve such a correlation, but by no means 
problematic. 

Comparing the profile likelihood and the Bayesian pdf for Aq the volume effects signifi- 
cantly enhance the relative weight of the secondary maximum at Aq ~ 800 GeV. Moreover, 
comparing the likelihood scales for /i < and (the correct) /x > 0, the relative enhancement 
of the Bayesian pdf is almost an order of magnitude, while the binned best-fit points differ 
by only a factor 5 for the profile likelihood. 



C. Purely high scale model 

Strictly speaking, the usual set of MSUGRA model parameters contain the high-scale 
mass parameters mo, mi/2, Aq, and on the other hand contain the weak-scale ratio of vacuum 
expectation values tan (3 = V2/V1, which explicitly assumes radiative electroweak symmetry 
breaking. Minimizing the potential in the directions of both vevs gives the two condi- 



tions 38 



„ mjf 2 sin j3 — mj^ ^ cos P 1 

u = ■ — ■ my 

^ COS2/3 2 ^ 

2i?/i = tan2/3 {m\ ^ — m\ ^ + m\sm2l3 (3) 

The masses muj correspond to the two Higgs doublets in the type-II two-Higgs doublet 
model of the MSSM. Hi has a tree-level coupling only to down-type fermions, while H2 
couples to up-type fermions only. The mass-squared parameter 5/i appears in front of 
mixed terms of the kind H^Hl^. Assuming electroweak symmetry breaking usually mn,] 
and tan j3 are used to compute the mass parameters B and yU, assuming the well measured 
Standard Model parameter mz- In MSUGRA the two scalar Higgs masses at the high scale 
are given by mo, so in fact only mo and tan (3 are used. 

A well-motivated alternative is to replace tan (3 with i? as a high-scale input parameter 
together with mo and compute tan (3 and /i (modulo its sign) assuming electroweak symmetry 
breaking and the Z mass. This approach has the advantage that all input parameters are 
high-scale mass parameters. This does not make a difference for frequentist profile-likelihood 
map, but in a Bayesian approach taking into account volume effects it does. 

To illustrate the effects of fiat priors either in B or in tan (3 the Bayesian pdfs and the pro- 
file likelihoods are shown in the mo-tan (3 plane and the one-dimensional tan (3 distributions 
in Fig. [5l From the best-fit points in Fig. [1] even after including theory errors the correct 
value for tan j3 can be determined from the set of LHC measurements. However, the first 
row of plots in Fig. O clearly shows that with a fiat prior in B the one-dimensional Bayesian 
pdf is largely dominated by noise and by a bias towards as small as possible tan (3. This 
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tan(P) 

FIG. 5: SFitter output for tan/5. The first row shows Bayesian pdfs with a flat prior in B, the 
second row Bayesian pdfs with a flat prior in tan /3, and the last row profile likelihoods. 

bias is simply an effect of the flat prior in B. Switching to a flat prior in tan/5, noise effects 
are still dominant, but the maximum of the one-dimensional Bayesian pdf is in the correct 
place. As expected, the profile likelihood picks the correct central value of tan/5 ~ 12 for 
the smeared parameter point. 

D. Errors on parameters 

Once a best-fit point has been determined from any set of measurements, the question 
arises what the precision of the determination of the parameters is. First the case for LHC 
measurements is studied and then the impact of the ILC is evaluated. 

1. LHC: masses vs kinematic endpoints 

To determine the central values and the errors on the fundamental parameters two differ- 
ent approaches are available for the LHC measurements. Either the kinematical endpoints 
or the particle masses (from a fit to the endpoints without any model assumptions [ll| ) 
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TABLE III: Best-fit results for MSUGRA at the LHC derived from masses and endpoint measure- 
ments witii absolute errors in GeV. The big columns correspond to mass and endpoint measure- 
ments. The subscript represents neglected, (probably approximate) gaussian or proper flat theory 
errors. The experimental error includes correlations unless indicated otherwise in the superscript. 
The top mass is quoted in the on-shell scheme. 



can serve as data. The first question is how an extraction of the MSUGRA model param- 
eters from kinematic endpoints hsted in Tab. [Ill compares to an extraction from the mass 
measurements listed in Tab. [H 



Because the extraction of masses from endpoints is highly correlated, both approaches are 
only equivalent if the complete correlation matrix of masses is taken into account. For the 
experimental errors the mass determination from edges introduces non-trivial correlations 
in the masses, whereas the theory is essentially uncorrelated in masses, but non trivially 
correlated in the endpoints. 

Numerically, theory errors cannot be neglected. In particular, the determination of tan f3 
and Aq largely relies on the light Higgs mass, which can be computed in perturbation 
theory [33|. This calculation has a parametric error, e.g. from the top Yukawa, and a 
systematic error due to unknown higher orders. The parametric errors are correlated with 
the direct mass measurements, which means they do not enter as theory errors from the Higgs 
mass calculation. The remaining theory error on the light Higgs mass due to unknown 
higher-order terms can be estimated to lie around 2 GeV [33|. For the top pole-mass 
measurement an experimental error of 1 GeV is expected at the LHC and therefore used in 
the analysis. As long as the experimental error stays above roughly a GeV, the theory error 
on the top mass from the unknown renormalization scheme of rrit at a hadron collider [s^ 
should be small, Aqcd ^ GeV. 

For supersymmetric partner masses in MSUGRA theory errors arise mostly from the 



limited perturbative order of the renormalization group running [40| . Moreover, at the weak 
scale higher-order corrections have to be taken into account when converting Lagrangian 
mass parameters into physical masses. The combined theory errors are estimated to an 
uncorrelated 1% (3%) for weakly (strongly) interacting particles j2l|, 2^. If a parameter 
point does not predict one of the endpoints included in the set of observables, the likelihood 
of this parameter point is set to zero. 

The errors on the MSUGRA parameters for different assumptions are shown in Tab. Illli 
Changing from mass measurements to endpoints measurements (for gaussian experimental 
errors and no correlations) improves the errors by a factor of more than three for mo and a 
factor two for the gaugino mass parameter mi/2- This improvement arises from the absence 
of the correlation matrix between the mass observables. If this matrix were known, the 



17 





mo mi/2 


tan /? 


Ao 


mt 


mo 


1 0.485 


0.523 


0.042 


0.063 




1 


-0.100 


0.648 


0.449 


tan/3 




1 


-0.467 


-0.192 


^0 






1 


0.495 


mt 








1 





mo mi/2 


tan (3 


^0 


mt 


mo 


1 0.501 


0.432 


0.094 


0.214 


mi/2 


1 


-0.206 


0.740 


0.720 


tan P 




1 


-0.401 


-0.256 


Ao 






1 


0.648 


mt 








1 



TABLE IV: The (symmetric) correlation matrix of all SUSY parameters in the MSUGRA fit using 
endpoint measurements at the LHC and including approximate gaussian (left panel) and proper 
flat (right panel) theory errors. 



results would be similar. As a next step, again using only experimental errors, but taking 
into account the correlation of the systematic energy-scale errors (JES and LES) a further 
improvement of a factor two for the common scalar mass parameter and a slight improvement 
for the gaugino mass parameter is observed. This comparison shows that to obtain the best 
precision from the LHC data, it is important to correctly estimate the correlation between 
the observables. 

The impact of theory errors on the parameter determination is shown in the next columns 
where first the gaussian (approximate) and then the flat (proper) theory error is studied. 
For the well-measured scalar and gaugino masses mi/2 the theory error increases the small 
purely experimental error considerably. For the ratio of the vacuum expectation values 
tan (3 the theory error on the Higgs mass becomes the dominant source of error, because 
the experimental precision on the Higgs mass measurement is almost a factor 10 better 
than its theory error. In the SPSla parameter point the two different techniques of treating 
the theory error give the same results within 20%. Note that the precision of the top 
mass parameter as part of the SUSY ensemble is slightly better than the direct top mass 
measurement alone. 



As expected, the correlation matrix between the different MSUGRA parameters is by no 
means diagonal. In Tab. [IV]mi/2 and tan/3 are largely uncorrelated, as are Ao and tan/5. 
The latter is somewhat unexpected in the light of the Higgs-mass measurement, but it can 
be understood by the pseudo-fixpoint behavior of At as a function of Aq and by the fact 
that the important parameter in the Higgs mass calculation is the light stop mass, which 
depends critically on mo and slightly on mi/2 38|]. The two mass parameters mo and mi/2 
are strongly correlated through the renormalization group running of the squark and slepton 
masses. Similarly, Ao and mi/2 are strongly correlated. 

Through most of this analysis SoftSUSY [36| is the workhorse for the renormalization- 
group evolution to link the high-scale MSUGRA model parameters with the weak-scale 
masses and other observables, including some higher-order corrections. As a consistency 
check on the theory errors, the observables were calculated with SoftSUSY, but the model 
parameters were determined with SuSpect jlO]. While the central values are shifted as 
expected, they are compatible within Sex, thus giving confidence that the estimated theory 
errors cover at least the different theoretical calculations. 



The distribution of 10000 individually run best-fit results to smeared data samples 
(pseudo-measurements) is shown in Fig. [61 Such a histogram is simply the numerical sim- 
ulation of error propagation j4l| and should in the gaussian case reproduce the same result 
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FIG. 6: SFitter output for vtlq and x^- For different assumptions for the theoretical error (neglected, 
gaussian and flat theoretical error from top to bottom) histograms for 10000 pseudo-measurements 
are shown. The dotted blue line shows a fitted gaussian for the rn-o-plots and a ^^-distribution 
with 16 degrees of freedom for the x^-plots, respectively. 

as a convolution of the different gaussian errors. For the first two rows only gaussian ex- 
perimental errors are assumed and (hopefully approximate) gaussian theory errors. Both of 
the resulting distributions for mo are gaussian, as are all the other distributions not shown 
here. For the third row the correct flat theory errors are shown. The vhq distribution is now 
slightly too narrow to be gaussian. On the other hand, all one-dimensional distributions 
are surprisingly similar to gaussian. However, this just reflects the central limit theorem, 
namely that if a distribution is probed often enough a gaussian distribution will be observed, 
independent of the shape of the errors. 

Depending on the relative impact of the different errors and on the detailed correlations, 
a non-gaussian behavior can be more or less pronounced for a finite number of attempts. 
For example, mi/2 is dominantly gaussian, even including fiat theory errors, while the Aq 
distribution is wide and not gaussian at all. As a check the distribution of the log-likelihood 

was computed and compared to the gaussian assumption. For neglected or gaussian the- 
ory errors the log-likelihood distribution matches a distribution with the correct number 
of degrees of freedom. For fiat theory errors the prescription effectively removes measure- 
ments which are within the theory-error bands from the counting of the degrees of freedom, 
thereby lowering the effective value of x^- 

In the list of measurements listed in Tab. [T] the LHC will only identify three out of 
four neutralinos — the third-heaviest neutralino will be missed due to its higgsino nature. 
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TABLE V: Best-fit results for MSUGRA at the LHC (endpoints) and including ILC measurements. 
Only absolute errors are given. The LHC results correspond to Tab. lIIH including flat theory errors. 

Higgsino-neutralino couplings to light-flavor fermions and sfermions are largely suppressed 
and can only be observed in cascade decays through gauge bosons or possibly a Higgs jl]. 
The question is what happens if the fourth-heaviest neutralino is wrongly labeled as third- 
heaviest. SFitter indeed finds a best-fitting parameter point to fit this data set. This point is 
slightly shifted in rriQ and mi/2 by up to 1 GeV. The largest difference between the correctly 
and wrongly assigned parameter points occurs in tan/?, which is shifted by about 2. The 
value remains reasonable for both points. 

While at first sight the set looks like a bona fide alternative minimum, it can easily 
be discarded using LHC data. Having determined the 'wrong' model parameters, the full 
spectrum and couplings can be predicted. In particular, the fourth neutralino now has a mass 
of about 400 GeV. For example, more squark decays to X4 than to xs ^^e predicted for this 
'wrong' parameter point, in contradiction to the data sample. Unfortunately, distinguishing 
such discrete alternative descriptions rely on signatures which should have to be seen. At 
the LHC, what can and what cannot be seen is determined by Standard Model backgrounds 
and detector effects, which makes an automated answering algorithm unrealistic. 

2. Impact of the ILC 

Combining LHC data with data from a future linear collider shifts the focus even further 
into the determination of the errors on the MSUGRA parameters. As shown in Tab. IVlthe 
errors on the parameters from ILC measurements alone are already considerably smaller than 
the LHC errors. This is true for all MSUGRA parameters, because for example the missing 
gluino-mass measurement at the ILC is not necessary because the weak gaugino masses are 
known. The general improvement of the errors is expected, since mass measurements at the 
ILC are about an order of magnitude more precise. The resulting improvement in precision 
on the model parameters is about a factor 5. Combining ILC and LHC measurements 
in MSUGRA only leads to a marginal additional improvement of the errors, even though 
squarks and gluinos largely escape the ILC analyses. The reason is that the precision of 
mo (simple error calculation) is dominated by the slepton masses alone. Comparing the 
LHC+ILC errors with and without theory errors show the margin for the improvement of 
theory predictions, justifying the SPA project (42| . 

The correlation between the parameter measurements is different once the ILC measure- 
ments are included. For example, Ao and tan f3 are now largely correlated. Such a correlation 
appears in the measurement of the off-diagonal entries of the scalar mixing matrices as well 
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as in rrih. In contrast to the LHC measurement, the top Yukawa is now largely uncorre- 
lated with all MSUGRA parameters, because it can be independently determined using the 
0.12 GeV measurement of the physical top mass. 



IV. WEAK SCALE MSSM LAGRANGIAN 

If supersymmetry or other new physics is observed at the TeV scale the weak-scale La- 
grangian should be determined from data. High-scale models for example of SUSY breaking 



then have to be inferred from this TeV-scale data [16|, ll7|, |43[ . This problem is what SFitter 
is really designed to solve, after being tested extensively in the lower-dimensional MSUGRA 
parameter space. 

The complete parameter space of the MSSM can have more than 100 parameters. How- 
ever, at experiments like the LHC some new-physics parameters can be fixed because no 



information on them is expected. This for example includes CP phases |41| or non- minimal 
flavor violation |^ for weak-scale high-p^ measurements at the LHC. It also includes the 
first and second generation trilinear couplings Ai^u,d, which in minimal flavor violation are 
multiplied by the corresponding Yukawa coupling and which beyond minimal flavor violation 
are very strongly constrained. 

Because at the LHC flavor information is difficult to obtain on light quarks, we use 
an average squark mass for left and right handed scalars. The different handedness can 
be distinguished through their appearance in different cascades. The right-handed squark 
typically decays directly to the bino and a quark, while the left-handed squark has a sizeable 
coupling to the wino, leading to the usual long decay chain. Unfortunately, in the currently 
experimentally simulated LHC data set there is little information on the stop-chargino 



sector [4^. Without this information, any combination of B physics data with high-pj' LHC 



data will fall short — we postpone a detailed discussion of this problem to a later paper [45 
In the lepton sector electrons can easily be separated from muons. A possible unification of 
the first two generations can then be determined from data (46| . 

The third-generation trilinear couplings Ar^ can in principle play a role as off-diagonal 
entries in the down-type mass matrices. However, they are multiplied by the corresponding 
Yukawas and compete with the term /xtan/3 ~ (60 GeV)^. Seeing effects of the trilinear 
coupling would require Af, > 1400 GeV, so for a low-tan /3 parameter point Ar^b have no 
impact on the likelihood around the correct or alternative best-fitting points. The same 

b appear as parameters in the computation of the light MSSM Higgs mass, but again 
their effect is negligible compared to for example At [33i]. There is a slim possibility that 
the stau mixing angle and with it Aj. might be determined in cascade decays similarly to 
the usual UED-SUSY spin analysis |47(], but this analysis has not yet been experimentally 
confirmed. 

Properly including rrit this leads to the effective 19- dimensional weak-scale MSSM pa- 
rameter space listed for example in Tab. IVII Obviously, the assumption of parameters being 
irrelevant for the MSSM likelihood map can and has to be tested. Moreover, the SFitter 
analysis will show that more than just the trilinear A parameters turn out to be invisible at 
the LHC. 

In contrast to the MSUGRA model tan j3 is used as a parameter in the Higgs sector and 
not B, because all MSSM parameters are defined at the weak scale assuming electroweak 
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symmetry breaking. In other words, tan (3 and rriA are the two Higgs-sector parameters in 
the MSSM analysis. Looking at the currently confirmed LHC measurements none of the 
heavy Higgs bosons with masses of the order 0{mA) would be seen in SPSla, which is not 
good news for the parameter determination in the Higgs sector. 

Because computing the mass spectrum in the weak-scale MSSM does not require any 
shift in scales, i.e. it does not involve renormalization group running or large logarithms, a 
smaller theory error for the on-shell particle masses should be assumed. As a rough estimate 
a relative error of 1% for the masses of strongly interacting particles and 0.5% for weakly 
interacting particles [2l|, 22| are used, plus a 2% non-parametric error on the light MSSM 
Higgs boson j33|. Just as in Sec. Ill Bl the correct flat theory errors, eq.([T]), are used for the 
determination of the errors on model parameters. 



A. MSSM likelihood map 

SFitter approaches the problem of the higher-dimensional MSSM parameter space in 
analogy to the MSUGRA case, but now organized in four steps: 

1. First, SFitter produces a set of Markov chains over the entire parameter space. The 
proposal function is constant, allowing the algorithm to cover the entire MSSM space 
without focusing on the resolution of local likelihood maxima. Starting from the 
best five points in this Markov chains Minuit resolves the local maxima in the likeli- 
hood map. This procedure ensures that there is no bias from starting points in the 
subsequent analysis. This step 1 can be repeated with different proposal functions, 
depending on the purpose of the Markov chain SFitter computes. 

2. In a second step the Markov chains and the additional high-resolution Minuit al- 
gorithm are limited to the gaugino-higgsino subspace Mi, M2, M3, yU, tan/5 and m^. 
Again, the proposal function is flat, focusing on the scan for local maxima in the like- 
lihood map. For the 15 best local maxima in this subspace their resolution is improved 
by Minuit. 

3. For the best point (s) in the gaugino-higgsino subspace these coordinates are then 
fixed. The step-3 Markov chain probes the additional scalar parameter space around 
the local maxima in the gaugino-higgsino space, assuming a Breit-Wigner proposal 
function with a width of 1% of the entire range in each direction. The resolution of 
the five best points is improved by Minuit. 

4. Finally, Minuit traces the correlations between the gaugino-higgsino parameter space 
and the remaining scalar mass parameters. Once the global best-fitting parameter 
point is identified the errors on all parameters are determined using the usual smeared 
set of pseudo measurements and fiat theory errors. 

All steps in the SFitter strategy are either Markov chains to globally probe the parameter 
space (with a fiat or a Breit-Wigner proposal function), or a Minuit hill climber to identify 
the likelihood maxima with high resolution. This approach can be applied to any problem 
involving a high-dimensional parameter space, but the details of course have to be adjusted. 
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131.1 


131.1 


131.1 


131.3 


131.0 


131.0 


131.1 


131.2 


Me, 


186.3 


186.4 


186.4 


186.5 


186.2 


186.2 


186.4 


186.4 


Men 


131.5 


131.5 


131.6 


131.7 


131.4 


131.4 


131.5 


131.6 


Mg3, 


497.1 


497.2 


494.1 


494.0 


495.6 


495.6 


495.8 


495.0 


Mr 

tR 


1073.9 


920.3 


547.9 


950.8 


547.9 


460.5 


978.2 


520.0 


Mr 
bR 


497.3 


497.3 


500.4 


500.9 


498.5 


498.5 


498.7 


499.6 


Mu 


525.1 


525.2 


525.3 


525.5 


525.0 


525.0 


525.2 


525.3 


Mqn 


511.3 


511.3 


511.4 


511.5 


511.2 


511.2 


511.4 


511.5 


At (-) 


-252.3 


-348.4 


-477.1 


-259.0 


-470.0 


-484.3 


-243.4 


-465.7 


At (+) 


384.9 


481.8 


641.5 


432.5 


739.2 


774.7 


440.5 


656.9 


ruA 


350.3 


725.8 


263.1 


1020.0 


171.6 


156.5 


897.6 


256.1 


mt 


171.4 


171.4 


171.4 


171.4 


171.4 


171.4 


171.4 


171.4 



TABLE VI: List of the eight best-fitting points in the MSSM likehhood map with two alternative 
solutions for At- All masses are given in GeV. The value for all points is approximately the 
same, so the ordering of the table is arbitrary. The parameter point closest to the correct point is 
labeled as SPSla. 

The large number of maxima mapped out in the second step corresponds to the expec- 
tations from the MSUGRA model: starting from the true parameter point an alternative 
solution with a switched sign in should exist. In the MSSM the hierarchy of Mi, M2 and 

can be interchanged, which altogether can give (9(10) distinct maxima in the likelihood 
map. To allow for additional structures or several best points in the Markov chain to corre- 
spond to the same local maximum, we increase the number of likelihood maxima returned 
after step 2 to 15. 

Last but not least, just as in the MSUGRA case alternative likelihood maxima triggered 
by correlations between the rather poorly measured parameters At, tan/? and the right- 
handed stop mass are expected. One could imagine that secondary maxima appear in the 
At - nit plane, like it happened in the MSUGRA case. However, this correlation is not clearly 
visible in the MSSM because of a lack of direct measurements in the stop sector. 

In analogy to the MSUGRA analysis general features of the log-likelihood map of the 
MSSM parameter space are studied before proceeding with profile likelihood or Bayesian 
probability distributions. Finally the proper error analysis is performed. The first question 
is the presence of alternative likelihood maxima in the MSSM parameter space. 

Tab. |Vl] lists the secondary local maxima in the likelihood map, focusing on the 
neutralino-chargino sector. These entries appear as a distinct secondary maximum in step 2 
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of SFitter. Each of them goes through steps 3 and 4, where it is exphcitly checked that for 
a given value of rrit no secondary hkehhood maxima in the scalar sector alone turn up. In 
step 4 the resolution on the local maxima is improved and the residual correlation between 
the neutralino-chargino and the scalar sectors are evaluated. 

The most interesting feature in the different best-fitting points listed in Tab. |Vl] is the 
structure of the neutralino sector. For a fixed sign of /i four equally good solutions are 
found, which can be classified by the ordering of the mass parameters: Mi < M2 < 
is the correct MSUGRA-type solution. The reverse ordering of the two gaugino masses 
M2 < Ml < \fi\ is equally likely. In both cases the missing neutralino will be a higgsino. 
Apart from these two light-gaugino scenarios the second-lightest neutralino can be mostly a 
higgsino, which corresponds to Mi < \fi\ < M2 and M2 < \fi\ < Mi. Note that given the set 
of LHC measurements the two gaugino masses can always be switched as long as there are 
no chargino constraints. The one neutralino which cannot be a higgsino is the LSP, because 
in that case the fi parameter would also affect the second neutralino mass and would have 
to be heavily tuned with the gaugino masses. Such a solution does not have a comparable 
log-likelihood to the other 2x4 scenarios. 

In spite of the different gaugino and higgsino contents, the physical masses of the three 
visible neutralinos are the same in all points listed in Tab. IVIt as is the precisely measured 
light Higgs mass. The shift in tan/3 for the correct SPSla parameter point is an effect of 
the smeared data set combined with the rather poor constraints on this parameter and is 
within the error bar (see later in this section). 

Looking at Tab. HTl there is an important feature of the set of measurements: there are 22 
measurements, counting the measurements involving mj separately for electrons and muons. 
Using these naively it should be possible to completely constrain a 19- dimensional parameter 
space. However, the situation is more complicated. These 22 measurements are constructed 
from only 15 underlying masses. The additional measurements will resolve ambiguities and 
improve errors, but they will not constrain any additional parameters. Looking at the set 
of measurements and at Tab. IIXI with the errors, five model parameters turn out to be not 
well constrained. One problem which has already been discussed is the heavy Higgs mass 
TTiA- The next poorly determined parameters are M^^ and At. These parameters occur in 
the stop sector, but none of them appear in any of the edge measurements. 

Moreover, there is no good direct measurement of tan (3. Looking at the neutralino 
and sfermion mixing matrices any effect in changing tan (3 can always be accommodated 
by a corresponding change in another parameter. This is particularly obvious in the poorly 
measured stau sector. There only the lighter of M^^ or Mf^ is determined from the kinematic 
endpoint of the rr invariant-mass distribution. The heavier mass parameter and tan (3 can 
compensate each other's effects freely. In contrast, the light-flavor slepton masses for all 
maxima are identical. This is an effect of the cascade measurements, which very strongly 
constrain the mass difference between the neutralinos and the light-flavor sleptons. 

There is exactly one measurement which strongly links these otherwise unconstrained 
parameters, the mass of the lightest Higgs boson m/j. This leaves a four-dimensional surface 
with a constant log-likelihood. As the dependence between the different parameters is highly 
non-linear, this limits the range in these parameters. Outside this surface the Higgs mass 
does not reach the measured value (or other elementary constraints like non-tachyonic stops 
are violated) no matter what the other parameters are. Therefore a meaningful error can 
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still be assigned to at least some of the parameters, while others turn out to be basically 
undetermined. 

The parameter points in Tab. |Vl] should therefore be seen as a 'typical' set of different 
solutions for these parameters. The common link, the lightest Higgs mass, illustrates the 
dependence on the individual parameters. 

To illustrate the effect of the minimum surface two values for At are quoted in the table 
of minima. One of them appears as a solution of the minimization procedure, while the 
other one is generated by an additional step where every parameter except At is kept fixed. 
The minimization is started from the original value for \At\ but with a flipped sign. This 
procedure gives only one additional solution. The significant shift in \At\ shows the sizeable 
correlations with the other parameters. Its origin is the stop contribution to the lightest 
Higgs mass which contains sub-leading terms linear in At. As a matter of fact, in other 
supersymmetric parameter points where n/ tan/3 is of the same order as At much larger 
terms linear in At would appear, while in SPSla the linear contributions of At to nth are 
strongly suppressed compared to the quadratic terms. 

The two alternative solutions with flipped signs of At are particularly interesting, since 
two alternative MSUGRA solutions have already been observed in section 1111 A[ There, 
the lack of measurements is compensated by the requirement of parameter unification at 
the GUT scale. In the general MSSM an alternative solution exists even if all parameters 
except for At are kept fixed. If the four-dimensional minimum surface can be constrained 
by further measurements, this degeneracy will vanish and correlations will require the other 
parameters to shift, in order to accommodate two distinct point-like minima. The prime 
candidate for such a shift is the top mass, as known from the SUGRA study. 

Technically, searching for alternative local maxima in the log-likelihood map it is much 
easier to use gaussian theory errors. Of course, this assumption is an approximation and 
cannot be used to quote errors on the parameter points. Moreover, it can be misleading 
when it comes to ranking the alternative solutions according to their log-likelihood. On 
the other hand, switching from gaussian to flat theory errors will only lead to a higher 
degeneracy of the log-likelihood because of the flat behavior of a,nd already for gaussian 
theory errors all alternative solutions are equally likely. Flat theory errors do not lead to 
additional alternative likelihood maxima or structures in the likelihood map. In particular, 
they do not change the statement, that the lightest neutralino has to be a gaugino to explain 
the cascade-decay measurements. 

As discussed in the MSUGRA case, these different interpretations of the LHC data set 
could at least in part be disentangled by additional channels which should open for different 
'wrong' mass parameters. 

B. Alternative mass assignment 

Another test of general features of the MSSM likelihood just based on best-fitting points 
is to exchange the two heavy neutralinos in the LHC measurements as discussed in sec- 
tion HHP II for MSUGRA. For this comparison the time-consuming error estimate at the 
end of step 4 is neglected and the log-likelihood values for the two best-fitting points are 
compared. The results for the two fits with the correct and swapped neutralino mass as- 
signments are shown in Tab. IVIIi After the discussion in the last section it is not surprising 
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SPSla 


correct inverted 




SPSla 


correct inverted 


Ml 


103.1 


102.1 


101.6 


M2 


192.9 


193.6 


191.0 


Ms 


577.9 


582.0 


582.1 


tan P 


10.0 


7.2 


7.8 


ruA 


394.9 


394.0 


299.3 




353.7 


347.7 


369.3 


Me, 


194.4 


192.3 


192.3 


Men 


135.8 


134.8 


134.8 




194.4 


191.0 


191.0 


Mf^n 


135.8 


134.7 


134.7 




193.6 


192.9 


185.7 


Mfn 


133.4 


128.1 


129.9 




526.6 


527.0 


527.1 


Mgn 


508.1 


514.8 


514.9 




480.8 


477.9 


478.5 




408.3 


423.6 


187.6 












502.9 


513.7 


513.2 




-251.1 


fixed 


Ar 


-249.4 


fixed 




-821.8 


fixed 


A, 


-763.4 


fixed 




-657.2 


fixed 


At 


-490.9 


-487.7 


-484.9 




171.4 


172.2 


172.2 







TABLE VII: Result for the MSSM parameter determination using the LHC endpoint measurements 
assuming either the third or fourth neutrahno to be missing. The log-hkehhood for both points is 
almost identical. All masses are given in GeV. 

that the likelihood for the two hypotheses in their best-fitting points is not significantly 
different. There are small shifts in all parameters entering the neutralino mass matrix, but 
none of them appear significant. The central values for the four neutralino masses move 
from {98.5, 175.7, 353.5, 374.9} GeV to {98.5, 175.8, 375.0, 393.3} GeV. The correctly iden- 
tified fourth neutralino in the first set has the same mass as the third neutralino in the 
swapped case. 

The consistent shift in the extracted value of tan (3 is an effect of the smeared parameter 
point. The relatively large shift in the heavy Higgs mass between the two scenarios looks 
more dramatic than it is. When taking into account the error on this parameter shown in 
Tab. llXl this shift will turn out to be well within the error bands and largely reflect different 
starting values combined with a flat log-likelihood distribution in itia- Even though the 
heavy Higgs mass is vastly different between the two cases, the light Higgs mass in both 
best-fitting points is identical. This means that for the typical LHC precision the parameter 
point SPSla is in the decoupling limit of the heavy MSSM Higgs states. 

It might be possible to search for higgsinos in cascade decays involving gauge bosons. Such 
a measurement could remove this degeneracy, namely the mis-identification for example of 
three out of four neutralinos. The same would be true if chargino masses could be included 
in the analysis, which are not part of the standard SPSla sample j44|. 

C. Profile likelihood and Bayesian probability 

The organization of SFitter in the MSSM case implies that it is not possible to produce 
a high-resolution Markov chain for the entire 19-dimensional MSSM parameter space. The 
only Markov chain covering the entire space is obtained at the end of step 1, and will 
be fairly coarse. On the other hand, a dense-coverage log-likelihood map of the MSSM 
parameter space as for the MSUGRA space cannot be produced because of the large number 
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of dimensions. This means that the analysis has to follow two paths in parallel, namely the 
analysis of global features using a Markov chain and the analysis of local features using 
additional Minuit-type algorithms described in the Appendix. 

The Markov chain produced in step 1 covers the entire MSSM parameter space. It 
should be used to compute lower-dimensional profile likelihoods or Bayesian probability 
distributions, following the discussion in Sec. Illli The problem is that to guarantee coverage 
of the entire MSSM parameter space a fiat proposal function is used, which reduces the 
acceptance probability below the per-mille level. This acceptance rate is fine for the intended 
purpose, namely to define an unbiased starting point for the maximum searches while making 
sure that no regions of parameter space are missed. In a repeat of step 1 a more appropriate 
proposal function can be used, for example a Breit-Wigner shape, with a width of one 
percent of the total parameter range in each direction. 

A slight technical complication is that weighted Markov chains require an accurate esti- 
mate of the size of excluded regions, i.e. regions with = 0. For example, the measure- 
ments of a mass difference in Tab. [TT] includes the sign of this mass difference. Parameter 
points with an inverted mass hierarchy are assigned a zero log-likelihood, which means one 
measurement can remove half of the entire parameter space. This feature of the kinematic 
endpoints reduces the relative volume of valid points in the exclusive log-likelihood map to a 
very small fraction and introduces large absolute errors on the determined size of this frac- 
tion. At this stage, these statistical fluctuations dominate the behavior of the marginalized 
Bayesian probabilities. To illustrate the log-likelihood map the number of points per bin, i.e. 
the traditional Markov chain algorithm, is used. For a small fraction of allowed parameter 
points this distribution is statistically more stable. As a drawback, only the relative size of 
entries in the log-likelihood map is significant. 

In Fig. [7] the marginalized Bayesian pdf is shown for selected MSSM parameters using 
an exclusive likelihood map with a Breit-Wigner proposal function. The two-dimensional 
Ml — M2 plane shows two branches, where one of the two gauginos has to form the lightest 
neutralino. The second-lightest neutralino can be either a gaugino or a higgsino. In the 
latter case the gaugino mass which does not fix the LSP mass can either determine the last 
remaining neutralino mass or it can essentially decouple. In the two-dimensional distribution 
a decoupled Mi corresponds for example to small M2 giving the correct LSP mass and a 
higgsino-like second-lightest neutralino. In the one-dimensional distribution for Mi there is 
a broad peak at the correct value, and a washed-out extended tail to large values. This tail 
is not a noise effect, but corresponds to the described decoupling. The same Mi distribution 
computed as a profile likelihood illustrates the problem with the Markov chain from step 1: 
in comparison to the Bayesian pdf from the non-weighted Markov chain the profile likelihood 
is dominated by noise. 

The selectron and the wino masses in the second panel of Fig. [7] are uncorrelated, which in 
retrospect justifies the 4-step organization of SFitter. Because of the explicit appearance of 
the gluino-sbottom mass difference in the list of measurements, TabJITt the gaugino-higgsino 
sector and the scalar sector are if at all correlated through the gluino — which means that 
M3 could as well be held fixed in step 2. This has little effect on the final result, but the 
gluino-sbottom correlation will be the dominant effect in step 4 of the SFitter strategy. 
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FIG. 7: Marginalized Bayesian likelihoods (first three panels) and profile likelihood (bottom-right 
panel) for the complete MSSM parameter space (step 1) from SFitter. A Breit-Wigner proposal 
function is used to produce a Markov chain with 10^ points. 

Given the lack of correlations between the neutralino-chargino sector and the scalar sector 
illustrated by Fig. [TJ information from the Markov chain can be extracted in the neutralino- 
chargino sector which SFitter computes in step 2. Fixing all scalar parameters is equivalent 
to scanning them over their orthogonal parameter space, provided the correlation between 
the sectors are negligible, i.e. the dimensions of the parameter space are indeed orthogonal. 
In FiglH] profile likelihoods (as defined in Sec. IIIip are shown for Mi 2 and /i. In the Mi — M2 
plane the same structure as in Fig. [7] is observed: one of the two gaugino masses corresponds 
to the measured LSP mass while the other gaugino mass can in principle decouple. In the 
Mi^2 — fJ' plane the three neutralino masses can be identified in the Mi^2 directions. For 
light Mi^2 the higgsino mass parameter can be large, while for one heavy gaugino is 
constrained to be small. 

The one-dimensional profile likelihood for example for Mi again shows these three options 
with peaks around 100, 200 and 350 GeV, corresponding to the three measured neutralino 
masses. The peak above 400 GeV is an alternative log-likelihood maximum which does not 
correspond to a measured neutralino mass. For M2 there is again the 100 GeV peak, where 
the LSP is a wino. The correct solution around 200 GeV is merged with the first maximum, 
while the third peak around 300 GeV corresponds to at least one light higgsino. In the 
profile likelihood for /i the two signs of fi both produce reasonable results. The 100 GeV 
range does not show a distinctive peak because it would require the two lightest neutralinos 
to be higgsinos, which means a high degree of tuning in all other parameters. However, peaks 
around 200 GeV are clearly observed and in the heavy-neutralino range for both signs of /i. 

Because the Markov chains for the neutralino-chargino sector are distinct, no information 
on the correlations between the two sectors after step 1 of our SFitter strategy is available. 
Using only the scalar-sector Markov chain from step 3 a small correlation is present in 
the two scalar masses occurring in the squark cascades. They are in principle slightly 
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FIG. 8: Profile likelihoods for the MSSM from SFitter. The distributions of the neutralino sector 
are derived from the log-likelihood map of the neutralino sector alone, using the Markov chain 
after step 2 in the SFitter strategy. 



correlated through the kinematic endpoints from the left-handed squark decay, but noise 
effects numerically dominate the profile likelihood. The one-dimensional profile likelihood 
for the squark mass parameter, however, is clearly peaked around the correct value. 

The combination of these two Markov chains is of course not suited to extract properly 
normalized probability distributions, because the scalar sector is simply fixed to some best- 
fit values out of step 1. On the other hand, these incomplete Markov chains show that our 
likelihood map for the MSSM parameter space works and contains the relevant structures, 
but that after step 1 it is somewhat noisy. 

In addition to the profile likelihoods shown in Fig. [HI SFitter also provides Bayesian 
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FIG. 9: Marginalized Bayesian probabilities for the MSSM from SFitter. The distributions of the 
neutralino sector are derived from the log-likelihood map of the neutralino sector alone, using the 
Markov chain after step 2 in the SFitter strategy. 



probability distributions. For the details of both approaches see section IIIII While the 
structures in the two-dimensional M1 — M2 plane in FiglHlare similar to the profile likelihood, 
the one- dimensional histograms show two significant differences: first, the Bayesian pdf for 
Ml 2 shows the same three physical solutions as the corresponding profile likelihood, namely 
one peak around 100 GeV, another one around 200 GeV, separated only by one bin from 
the edge of the 100 GeV peak, and a heavy-neutralino peak above 300 GeV (more visible 
for Ml). However, the peaks in the Bayesian pdf are much wider, as expected from the 
discussion of the MSUGRA case. The two lower peaks in M2 even appear as one, with a 
maximum around 150 GeV, which is a typical Bayesian volume effect. 

The second difference between the profile likelihood and the Bayesian pdf is that the 
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Bayesian pdf can answer the question: which neutrahno is the most hkely to be bino-hke. 
Note that only the neutrahno-chargino Markov chain from step 2 is used, so the probabihstic 
interpretation has to be taken with a grain of salt. However, while Mi has a best profile- 
likelihood entry around 350 GeV, the Bayesian pdf shows a clear maximum for around 
100 GeV. As usual, SFitter leaves the interpretation of the two different approaches to the 
reader. 

As expected, the difference between the two signs of fi is small, but both of them are 
driven to small values of \fi\, again by volume effects. This arises because of the decoupling of 
one of the two gaugino masses for a light higgsino, while for two light gauginos the higgsino 
mass is still determined by the fourth neutralino. The squark mass as a comparably well- 
measured and less noise-dominated parameter shows the kind of behavior known from the 
MSUGRA case: the profile likelihood is much more strongly peaked than the Bayesian pdf. 

D. Precision Analysis 

Similarly to the MSUGRA case, one of the most important outcomes of the SFitter 
parameter extraction is the proper definition of the errors of all extracted model parame- 
ters. The flat theory errors are now only weak-scale uncertainties, for example due to the 
translation of mass parameters into physical masses or due to higher-order effects in the 
observables. Compared to the MSUGRA case a proper error analysis in the MSSM is even 
more important: the errors at the end of the day will determine if and how well we can 
extract information on the SUSY breaking mechanism. 

1. Errors on MSSM parameters 

For the best-fit parameter point, we show the results for the error determination in Ta- 
bles [IX] and |VlTTl The general feature is that the LHC is not sensitive to several parameters. 
Some of them, namely the trilinear mixing terms Ai are fixed in the fit. Others, like the 
heavier stau-mass and stop-mass parameters or the pseudoscalar Higgs mass turn out to be 
unconstrained. In the stau sector only the lighter of the two mass eigenstates is observed in 
Tab. [H Because of the non-zero mixing between the two staus, the relative error on the mass 
parameter is much larger than the experimental error on the lighter stau mass. Because the 
heavy Higgses are for all practical purposes decoupled at the LHC, the parameters in the 
Higgs sector are tan (3 and the lightest stop mass. Because the sbottom masses are known 
from the gluino cascade decay, the stop mass matrix has two remaining free parameters. 

As expected in the slepton sector, the ILC improves the precision by an order of mag- 
nitude in the parameters be it with or without theory errors. Again the ILC alone, where 
parameters can be measured, dominates the precision. It is instructive to compare the effect 
of theory errors on the parameter determination. While the ILC loses a factor 5 in precision, 
going from a per-mille determination to half a percent, the LHC looses roughly less than 
a factor 2. The naive expectation would have called for only the ILC measurement being 
affected. However, the LHC measurements being functions of several sparticle masses, the 
error propagation leads also to a significant theory error (Table [TTl) ■ In particular the ii 
mass theory error is larger than the experimental error. The strength of the LHC is clearly 
visible in the sector of sparticles with color quantum numbers. 
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TABLE VIIL Result for the general MSSM parameter determination in SPSla assuming vanish- 
ing theory errors. As experimental measurements the kinematic endpoint measurements given in 
Tab. |TI] are used for the LHC column, and the mass measurements given in Tab. H] for the ILC 
column. In the LHC+ILC column these two measurements sets are combined. Shown are the 
nominal parameter values and the result after fits to the different data sets. All masses are given 
in GeV. 



While for the LHC and ILC separately not all parameters can be determined, the com- 
bination of the two machines allows to determine all parameters (with the exception of 
the first and second generation trilinear couplings) with good precision. The combination of 
LHC and ILC measurements can be particularly useful to determine the link to dark-matter 
observables 



18,Wa,W9^, 50, 51 



2. Testing unification 

Once the parameters of the weak-scale MSSM-Lagrangian have been determined, the 
next step is to extrapolate the parameters all the way to the Planck scale. Inspired by the 
apparent unification of the gauge couplings [5^ in the MSSM the question arises if any other 
running parameters unify at a higher scale as shown in the pioneering work in [sil, 53 1 . Such 
structures can give hints for example about supersymmetry-breaking. For two reasons, the 
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193.2± 8.8 


194.5± 1.3 


194.5± 


1.2 


194.4 




135.0± 8.3 


135.9± 0.87 


136.0± 


0.79 


135.8 


Me, 


193.3± 8.8 


194.4± 0.91 


194.4± 


0.84 


194.4 


Men 


135.0± 8.3 


135.8± 0.82 


135.9± 


0.73 


135.8 




481.4± 22.0 


499.4± O(102) 


493.1± 


23.2 


480.8 


Mr 

tR 


415.8±O(102) 


434.7±C'(4 • 10^) 


412.7± 


63.2 


408.3 


Mr 

bR 


501.7± 17.9 


fixed 500 


502.4± 


23.8 


502.9 


524.6± 14.5 


fixed 500 


526.1± 


7.2 


526.6 


MijR 


507.3± 17.5 


fixed 500 


509.0± 


19.2 


508.1 


Ar 


fixed 


613.4± O(IO^) 


764.7±C'(10^) 


-249.4 


At 


-509.1± 86.7 


-524.1± 0(10^) 


-493.1± 


262.9 


-490.9 


Ab 


fixed 


fixed 


199.6±C'(10'^) 


-763.4 




fixed 


fixed 


fixed 


-251.1 


Aul,2 


fixed 


fixed 


fixed 


-657.2 


Adl,2 


fixed 


fixed 


fixed 


-821.8 


ruA 


406.3±O(103) 


393.8± 1.6 


393.7± 


1.6 


394.9 




350.5± 14.5 


354.8± 3.1 


354. 7± 


3.0 


353.7 


mt 


171.4± 1.0 


171.4± 0.12 


171.4± 


0.12 


171.4 



TABLE IX: Result for the general MSSM parameter determination in SPSla assuming flat theory 
errors. As experimental measurements the kinematic endpoint measurements given in Tab. [U are 
used for the LHC column, and the mass measurements given in Tab. Ufor the ILC column. In the 
LHC-I-ILC column these two measurements sets are combined. Shown are the nominal parameter 
values and the result after fits to the different data sets. All masses are given in GeV. 



prime candidates for unification in supersymmetry are the gaugino masses: first, in contrast 
to the scalar masses, the three gaugino masses can well be argued to belong to the same 
sector of physics, being the partners of gauge bosons of a possibly unified gauge group. 
Secondly, interactions between the hidden SUSY-breaking sector and the MSSM particle 
content can affect the unification pattern, in particular for scalars. In that case, scalar mass 
unification might be replaced by much less obvious sum rules for scalar masses at some high 
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Technically, upwards running is considerably more complicated [20|, |55| than starting from 
a unification-scale and testing the unification hypothesis by comparing to the weak-scale 
particle spectrum. For example, it is by no means guaranteed that the renormalization group 
running will converge for weak-scale input values far away from the top-down prediction. 
In Figure [To] the extrapolation of the central values of the gaugino mass parameters is shown 
using SuSpect. As expected in SPSla, the mass parameters unify at the GUT scale. This 
figure is only a proof of concept for the SFitter approach to testing unification. A full study 
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2 4 6 8 10 12 14 16 18 
log(Q) 

FIG. 10: SFitter/SuSpect output for the upward renormalization group running of the three gaug- 
ino masses in the MSSM. The central values are shown without error bars, a more detailed study 
of bottom-up running is beyond the scope of this paper 55l |. 



of the extrapolation to the high scale including error estimate is beyond the scope of this 
paper [ssj . 



V. OUTLOOK 

If the LHC is successful in discovering physics beyond the Standard Model, the focus 
of its running will be on the interpretation of this new physics, identifying the ultravio- 
let completion of the Standard Model. The situation would be similar to current fits to 
electroweak precision data, flavor-physics data and dark-matter constraints, but likely con- 
siderably more complex. This increase in complexity is a challenge to the statistical tools 
employed to study high-dimensional physics parameter spaces. 

SFitter translates measurements for example of new particles' masses into information 
on the weak-scale Lagrangian. It uses a combination of (weighted) Markov chains and 
modified Minuit algorithms. The roughly 20- dimensional and highly correlated weak-scale 
MSSM parameter space can be controlled by SFitter. The correct description of all errors 
is a challenge for any high-dimensional parameter determination. However, especially to 
distinguish different new-physics models, a proper error propagation is crucial. Therefore, 
SFitter includes the proper treatment of statistical and systematic experimental errors as 
well as (flat) theory errors, including arbitrary correlation. 

As an example two physics models, the low-dimensional toy model MSUGRA and the 
effectively 19-dimensional MSSM, are analyzed in detail. SFitter first produces an uninte- 
grated log-likelihood map using Markov-chain techniques. For both models this likelihood 
map is studied and distinct local maxima are identified, which SFitter resolves using mod- 
ified Minuit algorithms. For the best-fitting parameter points the error on the extracted 
model parameters are determined, properly including all experimental and theory errors. 

Alternative maxima are for example due to the sign of the higgsino mass parameter, to 
the structure of the neutralino mass matrix, or to a correlation between the top Yukawa 
and the trilinear mixing parameter. While for MSUGRA these local maxima correspond 
to different values of the log-likelihood, they are degenerate in the MSSM and cannot be 
resolved using the relative likelihood values 
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Following either a profile likelihood or a Bayesian probability approach SFitter then 
computes lower-dimensional likelihood/probability distributions. For MSUGRA as well for 
the MSSM distributions in one and two dimensions are shown, illustrating the strengths and 
weaknesses of each of the two approaches. In the MSSM parameter space the complete log- 
likelihood map is complemented by corresponding maps over the approximately orthogonal 
gaugino-higgsino and scalar parameter spaces. Such analyses of lower-dimensional spaces 
lead to a less noisy likelihood map and can be useful in addition to the completely exclusive 
likelihood map. 

The determination of the parameters of the weak-scale Lagrangian from the LHC and 
the ILC and their errors are an essential ingredient to test unification. The SFitter approach 
is not limited to studies of the supersymmetric parameter space. It can and will be used to 
study any problem including mapping high-dimensional measurement and parameter spaces 
in the LHC era. 
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APPENDIX A: WEIGHTED MARKOV CHAINS 

Markov-chain Monte Carlos (MCMCs) have for a long time been a tool to evaluate 
functions for systems with a very large number of degrees of freedom. An example in BSM 
physics would be the prediction of a distribution for squark-gluino cross sections at the 
LHC, given the currently available data and a supersymmetric model [s^]. Computing 
LHC cross sections involves integrating over parton densities and is therefore expensive. 
Similarly, one can predict distributions of dark-matter detection rates given the current 
data, which again is a fairly expensive computation for each parameter point. The role of 
the MCMC is to provide us with a representative sample of parameter points, where in our 
case 'representative' is defined by the likelihood p{d\m) describing the probability of a model 
parameter point being correct given our LHC data. In general, this can be any normalized 
probability distribution p{m). 

We produce a sample which with respect to p{m) is a smaller copy of the complete 
parameter space using the Metropolis-Hastings algorithm [56|. This algorithm is nothing but 
an iterative chain of decisions if a new point is accepted as part of the Markov chain. As long 
as the probability of proposing a point m' while sitting in m is the same as the probability 
of proposing m sitting in m', the decision if the new point gets accepted depends solely on 
the values p{m) and p{m') of the probability we want to map: if the new p{m') > p{m) 
then the new point is accepted, otherwise it gets accepted with the probability p{m') / p{m) . 
Once this decision is made, the next parameter point m" is proposed, starting from either 
m or m' . 
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The proposal probability is the probability q{m m') with which we find new points 
which then get suggested as new entries in the Markov chain. Its choice is an internal choice 
in the Metropolis-Hastings algorithm, but can have a huge impact on the efficiency of 
probing the model-parameter space. For example, dark-matter constraints are notoriously 
difficult, because they generate narrow ridges in p{m) which are not aligned with any of 



more likely to develop distinct local maxima. The proposal function must be able to jump 
back and forth between these hills efficiently. For example a Gaussian distribution, which is 
indeed symmetric between the starting point and the target point, will have too suppressed 
tails to cover the MSSM parameter space. We could instead add a constant to the proposal 
probability, or use a Breit-Wigner distribution instead. In the more general case where 
the proposal distribution is not symmetric, the decision for a new point is not based on 
p{m')/p{m), but on [p(m') q{m — > m')]/[p(m) q{m' m)]. The only two requirements on 
the choice of q{m — * m') are that the proposal probability cannot have a memory of the 
earlier points in the Markov chain (detailed balance), and any point must have a non-zero 
probability of being proposed after a finite number of steps. The latter ensures coverage of 
the whole parameter space. The proposal function can for example be symmetric in m and 
m' or it can be independent of m' altogether. The efficiency for building a useful Markov 
chain is of course closely linked to the efficiency of finding new parameter points which get 
accepted with a reasonable probability. Generally, 25% is considered an optimal choice. 

In comparison to the usual Markov chain, the problem we are tackling with SFitter 
is simpler: we are only interested in the likelihood of some LHC measurement given a 
parameter point in our model, interpreted as a map over the model's parameter space: 
p[m) = p{d\m). Starting from this likelihood map we can either compute profile likeli- 
hoods of lower-dimensional parameter spaces or a Bayesian posterior probability distribu- 
tion p{m\d). This means that naively we would produce a representative sample with respect 
to this probability p{m), then evaluate again the same probability p{m), add an integration 
measure or find the profile likelihood, bin it, and obtain a likelihood or probability distribu- 
tion in a subspace of the complete vector m. To save computing time we should obviously 
retain the probability of each point in the Markov chain, similar to a phase-space Monte 
Carlo where we produce weighted events for integrated cross sections. 

To briefiy illustrate the possible gain in efficiency consider a binary system, where each 
parameter point enters one of two bins and the probability p of the two bins is divided as 
10% : 90%. We need at least 10 unweighted entries in the Markov chain to get the correct 
answer for the first time. Until then the probability associated with the first bin will be 
either zero or too large. If we use weighted events, two entries can already be sufficient, and 
each additional entry can improve the error on our extraction of the relative probability. 

Obviously, we cannot just keep the weight for each point in the Markov chain and multiply 
it into the binning procedure, since this would double-count this weight. Instead, we use a 
modified form of binning [s^] • We first consider the case that p ^ everywhere and then 
generalize this result to also include regions with p = 0. 

We define an inverse averaging in each bin as 




LHC measurements for example are less restrictive, but 



Punip ^ 0) 



bincount 
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bincount 
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where the sum in the denominator is over all points in the Markov chain which belong into 
this bin, counted with their correct multiplicity. It is easy to see that this gives the right 
answer. The numerator can be written as ^^^"i^"""^* 1. Now we take the limit of infinitely 
many points, so both sums turn into an integral 

P ■ ( 7^ 0) U!{x) ■ 1 (A2) 

™ / dx w{x)/p{x) ' 

where w{x) is an arbitrary weight function with J dx w{x) = 1. We choose w{x) = p{x) 
and obtain the desired result 

Phinip 7^ 0) = --^7— / dx p{x) , (A3) 



Vip ^ 0) 

where V[j) 7^ 0) is the volume of the bin in the parameter space. 

Note that this expression is only defined for p 7^ 0. This means we need to correct for 
regions where p = 0, as points in such regions will never enter the Markov chain. We store 
all points which we generate as suggested points during the evaluation of the Markov chain, 
and which are rejected because the probability is zero and compute the correction factor 

P.^n = PUP^O)-U-^ 7,, ) ■ (A4) 

\ zero count ■ Vhm / 

P{m — i> m') is the probability of suggesting m' from m. For our Weighted-Markov-Chain 
technique m is the previous point in the Markov chain and m' is the proposed point with 
p = 0. Vbin is the volume of the bin. We need to show that the second term in the bracket 
turns to Vbin(p = 0)/Vbin, the fraction of volume inside the bin where p vanishes. To do this 
we add an additional sum 

Ezcrocount r-,/ v^zerocount ^^A: p/^ ^ ™' \-l 

zero count ■ Vbin zero count ■ Vbin 

with k = 1 and ^ = m^. We now take the newly introduced sum in the numerator as 
a very crude approximation to the corresponding Monte Carlo integral, effectively taking 
the limit of infinite k. This is exactly the probability of hitting the region where p = 
times the total volume, which is just V^in(p = 0). P(mj '^ij) is the weight function of 
the Monte Carlo integration. Canceling zerocount in numerator and denominator gives the 
desired form. 

So far, we have discussed this weighting technique using a probability p. Markov chains, 
however, are more general. They allow every function / which is non-negative everywhere to 
be used as potential, and SFitter uses l/x^ as potential. It is easy to see that the expressions 
given above remain valid, as the normalization constant drops out in the final results. The 
resulting P is then an average of / over the bin. In the special case that / is constant we 
would obtain / again. 

For details of these Weighted Markov Chains (WMC), including their features under 



marginalization see Ref. [58 . 
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APPENDIX B: TOY MODEL 



To illustrate the SFitter results and output we use a simple toy model: we evaluate a 
potential (likelihood) V{m) over a 5-dimensional parameters space m. The potential has five 
distinct maxima, a small and a large sphere, a cigar and two cuboids, one of which is tilted. 
The background consists of a constant term and a fiat parabola centered at the origin: 
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SFitter analyzes this parameter space using two approaches: first, we produce a set of 
Markov chains sampling the entire parameter space as described in Appendix Rl correspond- 
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V = 75.1 ( 650± 16.3, 250± 16.3, 350± 16.3, 350± 16.3, 350± 16.3 ) 

V = 60.1 ( 850± 4.2, 225± 8.2, 650± 8.3, 650± 8.3, 650± 8.3 ) 

V = 25.1 ( 750± 10.0, 750± 10.0, 450± 29.9, 450± 29.9, 450± 29.9 ) 

V = 16.1 ( 250± 28.4, 250± 28.4, 550± 53.9, 550± 53.9, 550± 53.9 ) 

V = 12.1 ( 350±120.0, 650±119.9, 650±119.9, 650±119.9, 650±119.9 ) 




200 400 600 800 1000 200 400 600 800 1000 



FIG. 11: SFitter output for the 5-dimensional potential. First row: list of the largest values 
of V{m) in the entire parameter space. Second row: logarithmic map of y(mi,m2), as a profile 
likelihood (left), or marginalized over 7713^4^5 (right). Third row: distribution for V{mi), as a profile 
likelihood (left) or marginalized over 771,2,3,4,5 (right). 



ing to p{m) = V{m). This means we produce a sample of 10^ points, distributed equally 
over ten individual chains, which form a likelihood map of the parameter space m. 

In a second step SFitter starts from the maxima in the Markov chain for V{m) and 
searches for the local maxima with improved resolution. For the Bayesian probability func- 
tions this step is strictly speaking not necessary, as long as we are only interested in marginal- 
ized distributions. On the other hand, we always want to have a good idea what structure 
V{m) exhibits over the parameter space and where its maxima are. 

We eliminate local-maxima candidates if they are too close in parameter space and pro- 
duce the ranked list of the largest values of V{m) in the 5-dimensional parameter space, 
shown in Fig. [TTJ We see that as an isolated point the small sphere has the highest value of 
V{m). 

Technically, because the resolution of the Markov chain will in general be too coarse 
to match the data errors, we need an additional hill-climbing algorithm. We use a mod- 
ified version of Minuit f59|. For the gradient and diagonal second derivatives, we replace 
the simple three-point formulae in the standard Minuit version with Ridders' method j60| . 
This algorithm starts with the three-point formulae using a large step size, then iteratively 
shrinks the step size (typically by a factor of two) and computes an estimate using all points 
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calculated so far. The result of the three-point formula using only the new points is used 
to estimate the calculation's uncertainty. The iterations terminate when the desired accu- 
racy is reached or when numerical uncertainties dominate for very small step sizes. In this 
method, not only all odd-power terms in the Taylor expansion of the derivative cancel, but 
also the leading even-power terms, in turn improving the accuracy. In addition, the step 
size is dynamically adjusted to its optimal value. 

A slight complication arises from our box-shaped theory errors, because the function has 
a discontinuous second derivative. The Minos error estimate is in principle not affected by 
this, but this discontinuity breaks Ridders' algorithm: the higher derivative can now vastly 
differ between two neighboring points, and the terms listed above do not cancel any longer. 
To solve this problem, we replace the likelihood function by its original shape around the 
discontinuity: suppose the parameter point for which we want to compute the derivatives 
falls into the central region of eq.([T]) where log£ = 0. For the derivatives we always assume 
logC = 0, no matter if the parameter point probed by Ridders' algorithm falls inside or 
outside the fiat region. Similarly, in case the parameter point we are interested in is on the 
positive branch of the parabola, for the derivatives we just replace the fiat region with the 
opposite branch of the parabola. Note that this is only a technical trick to improve the 
estimate of the derivative and that the calculated values of log C are not used anywhere else. 

To reduce the number of dimensions over which we would like to compute a probability 
distribution we have three options: first we can simply slice the parameter space in 7713^4,5, 
which is useful to illustrate the behavior of V{m) but has no statistical meaning whatso- 
ever. Second, we can compute the profile likelihood described in Sec. IIII A| just projecting 
out dimensions by replacing the reduced-dimensional value of V by its maximum in the 
removed dimensions. And finally, we can marginalize over the dimensions. Note that only 
marginalization will produce a mathematically well-defined lower-dimensional probability 
distribution. Technically, marginalization means nothing but binning the pdf and collecting 
its values in a histogram for the two remaining dimensions mi^2- 

In the second row of Fig. [11] we immediately see that the small sphere appears more 
prominent in the profile likelihood version of V{mi,m2) while the large sphere dominates 
the two-dimensional Bayesian distribution of V {mi, 1712). The same effect we see in the 
one-dimensional distributions V{mi), where in the profile likelihood case one of the cuboids 
appears prominently, as expected from the list of best values for V{m). If V were a pdf we 
could conclude that the small sphere contains the most likely parameter points while the 
large sphere is the most likely physics configuration. This dominance of the large sphere 
over the most likely single point in the small sphere is an effect of the marginalization, i.e. 
an example for a volume effect. 

The question if such volume effects should be considered, if instead the best single point 
is preferable, or if actually the third-best point should be picked out by a theory bias cannot 
and should not be answered by SFitter as a tool. Instead, SFitter provides all information 
needed by the user to correctly answer each of these different questions. 
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